Skip to content

Conversation

@nojnhuh
Copy link
Contributor

@nojnhuh nojnhuh commented Nov 21, 2025

This PR lays the groundwork for the example driver to be able to demonstrate several different kinds of devices, each implemented as a separate "profile." The changes mostly involve extracting the GPU-specific details and exposing handles to plug in different logic.

To see how this works with multiple profiles, I have a POC based on these changes implementing a new gpupart profile which demonstrates partitionable devices: nojnhuh/dra-example-driver@profiles...gpupart

Fixes #92

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: nojnhuh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Nov 21, 2025
@nojnhuh
Copy link
Contributor Author

nojnhuh commented Nov 21, 2025

I wanted to open this PR before I'm off on vacation next week, but I intend to take a look at #128 first and incorporate those changes here before merging this. FYI @guptaNswati

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 21, 2025
@nojnhuh
Copy link
Contributor Author

nojnhuh commented Nov 21, 2025

@sunya-ch If you could take a look to see if your consumable capacity changes are able to align with this that would be great!

@pohly pohly moved this from 🆕 New to 👀 In review in Dynamic Resource Allocation Nov 24, 2025
Copy link

@sunya-ch sunya-ch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nojnhuh
Could we define a DeviceProfile interface/struct with abstract methods like GetDevices, ApplyConfig, and ValidateConfig, and then keep a mapping from profile name to its implementation?
If we take that approach, I could implement ConsumableNetDevProfile or ConsumableGPUProfile independently. What do you think?

@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Dec 2, 2025
Copy link

@sunya-ch sunya-ch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nojnhuh Thank you so much for considering my comment. The update looks good to me. I just leave some comments which I am not sure whether those are on purpose or need to be concerned.


// Validate implements [ConfigHandler].
func (n NoopConfigHandler) Validate(config runtime.Object) error {
return errors.New("no configuration allowed")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it expected behavior to return an error? The error message confuse me a bit whether it is allowed or error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, the goal of the NoopConfigHandler is to disallow any configuration to prevent users from supplying configuration that the driver doesn't know how to handle. I can see how the message can be unclear though whether it's read as "no (configuration allowed)" or "(no configuration) allowed" so I've reworded this to "configuration not allowed". Returning an error here is intended to help enforce that. Does that help clarify?


// ApplyConfig implements [ConfigHandler].
func (n NoopConfigHandler) ApplyConfig(config runtime.Object, results []*resourceapi.DeviceRequestAllocationResult) (PerDeviceCDIContainerEdits, error) {
return nil, errors.New("no configuration allowed")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as Validate. The wording confuse me a bit.

}

// SchemeBuilder implements [profiles.ConfigHandler].
func (p Profile) SchemeBuilder() runtime.SchemeBuilder {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be more straightforward to have ConfigHandler implemented separately from Profile.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these are logically coupled enough that it makes sense to keep them together. Tying the profile and config handling together in the profile I think makes more sense then somewhere higher-level like main.go. Implementing them together makes initialization a bit simpler: "--device-profile => Profile (also implementing ConfigHandler)" vs. "--device-profile => separate Profile and ConfigHandler". I don't think we'd ever want a user to specify the profile and config handler independently.

Copy link

@sunya-ch sunya-ch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot
Copy link
Contributor

@sunya-ch: changing LGTM is restricted to collaborators

In response to this:

/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

demonstrating advanced features

3 participants